{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正则表达式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 常用符号"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### . ：匹配任意字符，换行符 \\n 除外；每个 . 表示一个占位符。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注释：使用re之前，要导入re库文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['xy1']\n"
     ]
    }
   ],
   "source": [
    "a = 'xy123'\n",
    "b = re.findall('x..',a)\n",
    "print (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *：匹配前一个字符的0次或无限次。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['xx', '', 'x', '', '', '', '', '']\n"
     ]
    }
   ],
   "source": [
    "a = 'xxyxy123'\n",
    "b = re.findall('x*',a)\n",
    "print (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ? ：匹配前一个字符0次或1次。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['x', 'xy', 'xy']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = 'xxyxy123'\n",
    "b = re.findall('xy?',a)\n",
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### .* ：贪心算法，尽可能的匹配多的字符。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['xxIxxfasdjifja134xxlovexx23345sdfxxyouxx']"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'\n",
    "b = re.findall('xx.*xx',secret_code)\n",
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### .*? ：非贪心算法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['xxIxx', 'xxlovexx', 'xxyouxx']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'\n",
    "c = re.findall('xx.*?xx',secret_code)\n",
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### () ：括号内的数据作为结果返回。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['I', 'love', 'you']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'\n",
    "c = re.findall('xx(.*?)xx',secret_code)\n",
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## re.S的使用举例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在Python的正则表达式中，有一个参数为re.S。它表示“.”（不包含外侧双引号，下同）的作用扩展到整个字符串，包括“\\n”。看如下代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b is []\n",
      "C is ['pass:\\n    234455\\n    ']\n"
     ]
    }
   ],
   "source": [
    "a = '''asdfsafhellopass:\n",
    "    234455\n",
    "    worldafdsf\n",
    "\n",
    "    '''\n",
    "b = re.findall('hello(.*?)world',a)\n",
    "c = re.findall('hello(.*?)world',a,re.S) #使用re.S\n",
    "print(\"b is\", b)\n",
    "print(\"C is\", c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正则表达式中，“.”的作用是匹配除“\\n”以外的任何字符，也就是说，它是在一行中进行匹配。这里的“行”是以“\\n”进行区分的。a字符串有每行的末尾有一个“\\n”，不过它不可见。\n",
    "\n",
    "如果不使用re.S参数，则只在每一行内进行匹配，如果一行没有，就换下一行重新开始，不会跨行。而使用re.S参数以后，正则表达式会将这个字符串作为一个整体，将“\\n”当做一个普通的字符加入到这个字符串中，在整体中进行匹配。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## re.findall(pattern, string[, flags])："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "搜索string，以列表形式返回全部能匹配的子串。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "例子1："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['1', '2', '3', '4']\n"
     ]
    }
   ],
   "source": [
    "pattern = re.compile(r'\\d+')\n",
    "print (re.findall(pattern,'one1two2three3four4'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "例子2："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('I', 'love')]\n",
      "('I', 'love')\n"
     ]
    }
   ],
   "source": [
    "s2 = 'asdfxxIxx123xxlovexxdfd'\n",
    "\n",
    "f2 = re.findall('xx(.*?)xx123xx(.*?)xx',s2)\n",
    "\n",
    "print (f2)\n",
    "\n",
    "print (f2[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "刘德华 \n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "string = '刘德华 Andy Lau'\n",
    "pattern = '.*?\\s'\n",
    "s = re.match(pattern=pattern,string=string)\n",
    "print(s.group())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "刘德华 \n"
     ]
    }
   ],
   "source": [
    "print(s.group(0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "273px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
